Introduction

In this notebook, we will use Python Materials Genomics (http://www.pymatgen.org), written by Professor Shyue Ping Ong, to data mine the Materials Project (https://www.materialsproject.org) for statistics on space groups. The Materials Project is a large database containing computed data on all inorganic materials. Unlike other large databases, the Materials Project employs a structure matching algorithm to perform duplicate structure filtering. This allows for a more robust statistical analysis.


In [1]:
# Note that this notebook does require a number of dependencies. 
# pytmatgen is used for the high-level interface to the Materials Project.
# The excellent prettyplotlib is used to make nice plots.
# prettytable is used to make nice ASCII tables.
from pymatgen import MPRester, Composition
import prettyplotlib as ppl
import brewer2mpl
import matplotlib.pyplot as plt
from pymatgen.util.plotting_utils import get_publication_quality_plot
import numpy as np
from prettytable import PrettyTable
from symmetry.groups import sg_symbol_from_int_number

%matplotlib inline

Let us use pymatgen's MPRester to query for all formulas and spacegroup information using the Materials Project's RESTful API.


In [2]:
m = MPRester()
data = m.mpquery({}, ["pretty_formula", "spacegroup.number"])

Distribution of Space Groups

We will now perform a simple histogram analysis of the distribution of space groups.


In [3]:
sgnums = [d["spacegroup.number"] for d in data]
n, bins = np.histogram(sgnums, bins=np.arange(0.5, 231.5, 1), normed=True)
crystal_systems = [(2, "Triclinic"), (15, "Monoclinic"), (74, "Orthorhombic"),
                   (142, "Tetragonal"), (167, "Trigonal"), (194, "Hexagonal"), (230, "Cubic")]
colors = brewer2mpl.get_map('Set1', 'qualitative', len(crystal_systems)).mpl_colors
start = 0
for i, (v, label) in enumerate(crystal_systems):
    ppl.bar(np.arange(start + 0.5, v + 0.5, 1), n[start:v] * 100, width=1, label=label, color=colors[i])
    start = v
plt.xlim([0, 231])
plt = get_publication_quality_plot(12, 8, plt)
plt.legend([c[1] for c in crystal_systems])
l = plt.xlabel("Int. Space Group No.")
l = plt.ylabel("% of Compounds")


Most and least common Space Groups

Let's get the most common and least common spacegroups.


In [4]:
def make_plot(hist_data):
    labels = []
    y = []
    for i in sorted(range(1, 231), key=lambda i: hist_data[i - 1], reverse=True)[0:20]:
        labels.append("%d - %s" % (i, sg_symbol_from_int_number(i)))
        y.append(hist_data[i - 1] * 100)
    labels.reverse()
    y.reverse()
    ppl.barh(range(20), y, height=1, annotate=True, grid='x', color=colors[1])
    global plt
    plt = get_publication_quality_plot(12, 8, plt)
    plt.yticks(np.arange(20) + 0.5, labels, fontsize=16)
    plt.xticks(fontsize=16)
    plt.xlabel("% of compounds", fontsize=16)
    plt.title('Most common space groups', fontsize=16)
    
n, bins = np.histogram(sgnums, bins=np.arange(0.5, 231.5, 1), normed=True)
make_plot(n)


Compare the results above with the tables provided in "Properties of Materials: Anisotropy, Symmetry, Structure" by Robert E. Newnham.

More advanced analysis

Let us now go beyond just simple histogrammic analysis and look at the statistics for specific chemistries. For example, we can ask the question - does oxides, sulfides, fluorides and chlorides prefer different space groups? Let us first group the data by the most electronegative element - a proxy of whether it is an oxide, fluoride, etc.


In [5]:
def get_chemistry(d):
    anion = sorted(Composition(d["pretty_formula"]).keys(), key=lambda el: el.X)[-1].symbol
    if anion == "O":
        return "Oxide"
    if anion == "S":
        return "Sulfide"
    elif anion == "F":
        return "Fluoride"
    elif anion == "Cl":
        return "Chloride"
    return "Others"

import itertools
sdata = sorted(data, key=get_chemistry)
grouped_data = {chem: list(d) for chem, d in itertools.groupby(sdata, key=get_chemistry)}

In [6]:
def make_plots(chemistry):
    sgnums = [d["spacegroup.number"] for d in grouped_data[chemistry]]
    hist, bins = np.histogram(sgnums, bins=np.arange(0.5, 231.5, 1), normed=True)
    make_plot(hist)

Space Group Distribution in Oxides


In [7]:
make_plots("Oxide")


We can see that oxides have an even higher prevalence of the monoclinic $P12_1/c1$ and triclinc $P\overline{1}$ space groups, compared to the overall distribution for all compounds.

Space Group Distribution in Sulfides


In [8]:
make_plots("Sulfide")


The distribution for sulfides is quite different. Now, the orthorhombic $Pnma$ is the most prevalent. In general, it seems that sulfides tend to have higher-symmetry structures that oxides (space group numbers are generally larger).

Space Group Distribution in Fluorides


In [9]:
make_plots("Fluoride")


Again, we see that fluorides seem to have a similar distribution as the oxides, with a higher than average prevalence of the monoclinic $P12_1/c1$ and triclinc $P\overline{1}$ space groups.

Space Group Distribution in Chlorides


In [10]:
make_plots("Chloride")


Though not as stark as the comparison between oxides and sulfides, we also observe that chlorides tend to have higher prevalence of high-symmetry spacegroups than fluorides.

Conclusion

In this notebook, we have done some basic analysis of prevalence of space groups in inorganic crystals in the Materials Project and detailed analysis for some common chemistries. We find that sulfides tend to have higher prevalence of high-symmetry space groups compared to oxides, and a similar trend is observed chlorides versus fluorides. Though we cannot make firm conclusions based on just these few data points, it does suggest that larger anion sizes lead to higher-symmetry crystals.